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ABSTRACT 

In this paper, we discuss how loop level parallelism is detected in a 
nonprocedural dataflow program and how a procedural program with concurrent 
loops is scheduled. In addition, we discuss a program restructuring technique 
which may be applied to recursive equations so that concurrent loops may be 
generated for a seemingly iterative computation. A compiler which generates 
C code for the language described below has been Implemented. We describe 
the scheduling component of the compiler and the restructuring transformation. 
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Exploiting Loop Level Parallelism in Nonprocedural Dataflow 

Programs 

Maya B. Gokhale 

Department of Computer and Information Sciences 
University of Delaware 


1 Introduction 

Loop level parallelism has been recognized as having major impact in the performance of parallel 
programs on MIMD machines. Most parallel languages contain some sort of forall construct (for 
example, see [7] or [12]) and much effort has been directed towards detecting forall loops in sequen- 
tial programs (for a small sampling of the literature in this area, see [1], [3], [4], [8], [9], [13], [16]) 
and automatically generating parallel versions of the sequential programs. 

In this work, we also address the question of automatically generating forall loops, but from a 
different perspective. Our starting point is a very high level dataflow language PS (similar to [2], 
[17], [19], and [18]), attractive because of its functional semantics which greatly facilitate program 
restructuring. 

To demonstrate this, we show how we can transform certain forms of subscript expressions used 
to index recursively-defined arrays so that concurrent loops may be generated instead of sequential 
ones. Although these transformations are also applicable to sequential programs, a more careful 
analysis of reassignment and aliasing may be required in the sequential program. In some cases, 
the programmer’s use of aliasing and reassignment may prevent the transformations from being 
applied. 

In addition to the benefits of its functional semantics, the language PS is attractive for the close 
correspondence in form between equations used to describe numerical algorithms and equations 
in PS. In fact, we may consider the language to be an internal (albeit textual) representation of 
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equations such as Equation 1 below. Our ultimate goal is a translator of equations in the form of 
(1), perhaps as TJjjXor Postscript files, to modules in this language. 

To date we are implementing a compiler for PS, currently 24,000 lines of Pascal, which generates 
declarations and functions in the C language. The compiler automatically collects equations into 
groups for which are generated C for loops. Each loop is annotated to indicate whether it is an 
iterative or concurrent for. The subscript transformation described in Section 4 has been developed 
independently, and is being integerated into the compiler. 

In this paper, we describe the scheduling phase of the compiler, with particular emphasis on 

• differentiating between parallel and sequential loops, 

• memory reuse in the generated imperative code, and 

• array transformation to facilitate parallel loop generation and memory reuse. 


2 The PS Language 

The Problem Specification (PS) language [6] is a very high level dataflow language. A program in 
this language consists of one or more module descriptions, where a module is simply a functional 
unit, taking 0 or more input parameters and returning 1 or more result. Internal to a module 
the data declarations resemble Pascal or Modula-2. There may be user-defined types or variables 
declared. Standard Pascal data types are provided (primitive types, enumerations, arrays, records). 
In place of the procedural code, however, PS has a define section consisting of equations defining 
values for all non-input variables. The equations may be entered in any order. An equation in PS 
is a restricted form of mathematical equation in that the left hand side of the is either a single 
variable or a list of variables, and the right hand side is an expression of the same arity and type 
as the the left hand side. A scheduling phase of the compiler derives from the data dependency 
graph of the equations an ordering for the procedural code which is emitted. 

Example: Let us take a a simplified version of standard relaxation, where for non- boundary 
elements and for k > 1, 

= (A^[i,j - 1) + A^[i - 1, j] + A^[iJ + 1] + A^[i + 1, j])/4 (1) 

Note that in this example, all element values are taken from the previous iteration. In PS the 
superscripts (iteration number) and subscripts (array element) are not differentiated. All of them 
are put in as subscripts: 

ACK.I.J] » ( AEK-l.I.J-l] 

♦ACK-l.I-l.J] 

♦A0C-1.X.J+1] 

+A[K-1,I+1,J] ) / 4; 
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To make a PS module out of this fragment, we first write the module header: 

Relaxation: module (InitialA: array [I, J] of real; 

M: int; maxK : int) : 

[newA:array[I, J] of real]; 

InitialA is the input array of dimension M x M, maxK is the number of iterations desired, and 
newA is the array returned as the module result. 

Next, we define types and local variables: 


type 

I.J 3 0 .. M+l; K 3 1 . . maxK; 

var 

A: array [K] of array [I.J] of real; 


We have defined subrange types I and J as ranging from 0 to M + 1, so that the boundary may be 
padded with 0’s. K is the superscript from the initial formula. Since A has dimensionality which is 
the sum of subscripts and superscripts, 1 it is declared as a local 3-dimensional array. 

Now we insert the equations. The initial value of A is simply the input array InitialA. The 
result NewA is the value of the maxK’th element of A. 

A[l] * InitialA; (* the first grid is input *) 
newA 3 A [maxK]; (* the grid returned is 

from the last iteration *) 

Next we give the equation defining the other elements of A, including the boundary values. This 
equation uses an if expression to determine whether an element is a boundary value or an interior 
value. 

A[K,I,J] * if (I * 0) (* carry over boundary points *) 

or (J ■ 0) 
or (I 3 M+l) 
or (J 3 M+l) 
then A[K-1 ,I,J] 
else ( AtK-l.I.J-l] 

‘In keeping with other single assignment languages, a value is never changed. Rather a new value is generated 
from a computation involving the old value. 
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+AR-1.I-1.J] 
♦ACK-l.I.J+l] 
+ACK-1.I+1.J] ) / 4; 


The entire module is shown in Figure 1. 


3 Scheduling the Equations of a Module 

The PS compiler consists of three components, 

• the “front end” which does syntax and semantic analysis and stores the entire program in an 
internal form 

• the scheduler, which, on a module by module basis, builds a data dependency graph, analyzes 
the graph and generates a flowchart of execution ordering including, if necessary, iterative 
and parallel loops 

• the code generator which generates procedural code from the flowchart. 

In this paper we concentrate on the scheduler, beginning with a description of the dependency 
graph. 


3.1 The Dependency Graph 

The dependency graph G = (IV, E), where the set of nodes N contains the data items and equations 
of the module, and E contains directed edges between nodes. A directed edge is drawn horn node 
i to node j if data produced in i is used in j. Thus the graph is simply a dataflow graph, showing 
the flow of data from producer to consumer. There exist data dependency edges from all variables 
on the right hand side of an equation to the equation, and from the equation to the variable on the 
left hand side. In addition, data dependency edges are drawn from variables defining a subrange 
bound to variables using that subrange. For example, a data dependency edge is drawn from M to 
Initial!, to A, and to NewA, since the bounds of these arrays depend on M. A data dependency 
edge is drawn from m&xK to A for the same reason. Besides the data dependency edges, certain 
hierarchical edges also are drawn. These are used to show the relationship between the fields of a 
record and the record itself, and do not concern us further in this example. 

Each node and each edge is annotated with a list of labels. There is a node label for each 
dimension of the node, eg. an array A[K,I,J] has three node labels, describing respectively, the 
dimensions K, I and J. The edge labels contain information about the subscript expression used to 
reference the source node. Figure 2 show in further detail the attributes of the edge labels. 

The dependency graph for the Relaxation Module is shown in Figure 3. 
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(*$m+v+x+t-*) 

Relaxation: module (InitlalA: array[l,J] of real; 

M: int; maxK : int) : 

CnewA: array [X, J] of real]; 


type 

I,J « 0 .. M+l; K * 2 .. maxK; 

var A: array [1 .. maxK] of array [I ,J] of real; 

(* A denotes the succession of grids *) 

define 

(*eq.l*) A[l] * InitlalA; (* the first grid is input *) 

(*eq.2*) newA * A [maxK] ; (* the grid returned in from 

the last iteration *) 

(*eq.3*) A[K,I,J] * if (I * 0) (* carry over boundary points *) 

or (J ■ 0) 
or (I * M+l) 
or (J ■ M+l) 
then ACK-l.I.J] 
else ( ACK-l.I.J-1] 

+A[K-1,I-1.J] 

♦ACK-l.I.J+l] 

+ACK-1.I+1.J] ) / 4; 

end Relaxation; 


Figure 1: The Relaxation Module 
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• Position in Target of this Source Subscript 

• Subscript Expression Type 

- “P as in A[I] 

- “I - constant” as in A[I-2] 

- any other expression 

• Offset amount. Applicable only to “I - constant” subscript expression 

Figure 2: Edge Label Attributes 



Figure 3: Dependency Graph for Relaxation Module 
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• Descriptor type: either Dependency Graph Node or Subrange Type 

• If Subrange Type, 

— Is an iterative loop to be generated from this subrange or is a parallel loopto be gener- 
ated? 

— List of descriptors which are nested within this subrange 


Figure 4: A Flowchart Descriptor 


3.2 Scheduler Output 

If the scheduler can determine an execution ordering for the equations, it generates a flowchart 
describing both the order of equations and the loop nesting structure in which the equations are 
embedded. The flowchart, then, is used by the code generator to emit the procedural code. The 
flowchart is simply a list of descriptors. A descriptor may indicate either a dependency graph node 
or a subrange type. The code generator, on encountering the former, emits code for the data item 
or the equation. The presence of the latter means that a for loop over the indicated subrange is 
to be generated. A subrange type descriptor also contains a list of descriptors which are contained 
within the scope of the loop. Thus the flowchart is a recursive structure which reflects the nesting 
structure of the generated program. The format of a descriptor is shown in Figure 4. 

3.3 The Scheduling Algorithm 

The scheduling algorithm described here is a variant of the algorithm of [15]. Our algorithm is 
most similar to [5], which generates a schedule for subsequent code generation to a procedural 
data flow language. The algorithm described below does distinguish between iterative and parallel 
loops, but performs poorly in other respects, such as combining into a single loop those equations 
which though not recursively related, nevertheless depend on the same subscript(s). See [11] for a 
scheduling algorithm which produces only iterative loops, but does combine non-recursively related 
equations which depend on the same subscript(s) , and [5] for the algorithm which distinguishes 
between iterative and parallel loops, but does not combine iterative components which depend on 
the same subscript(s). 

The scheduler consists of two mutually recursive procedures. The first, Schedule-Graph, takes 
as input a dependency graph, and returns a flowchart. The second, Schedule-Component, sched- 
ules a Maximally Strongly Connected Component (MSCC) of a dependency graph, and returns a 
flowchart. 

Schedule-Graph operates as follows: 
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1. Find the MSCC’s of the graph {Af,}, s = 1 . • . n, where n is the number of MSCC’s. 

2. Initialize the flowchart to null. 

3. For each Aft, 

(a) Call Schedule-Component with Af t * as input 

(b) Concatenate the result returned by Schedule- Component onto the flowchart. 

4. Return the flowchart. 

Schedule-Component, in turn, does the following: 

1. If the component consists of exactly one data node, exit with a null schedule. 

2. Pick an unscheduled node dimension to use as loop subscript. 

(a) If there are no more dimensions left to be scheduled and the graph contains more than 1 
node, then signal error and return: the equations cannot be scheduled by this algorithm. 

(b) If there are no more dimensions left to be scheduled and the graph contains exactly 1 
node then return as flowchart that single node. 

3. Otherwise verify that the subrange associated with that dimension appears in a consistent 
position in each node of the component, 2 and that the only subscript expressions used in 
that dimension are either or a I - constant”. 

4. Delete edges in Af t which contain subscript expressions of type “I - constant” in the dimension 
being scheduled. 3 

5. Mark the dimension as scheduled. 

6. Create a flowchart descriptor for a Subrange Type. If “I - constant” edges were deleted, 
record that an iterative loop is to be generated, otherwise parallel. 

7. Call Schedule-Graph with the subgraph which results from deleting the “I - constant” edges. 

8. Concatenate the result returned by Schedule-Graph onto the Subrange Type flowchart de- 
scriptor created above, and return the resulting list. 

Let us apply the scheduling algorithm to the relaxation dependency graph (Figure 3). The 
component graph and the flowchart for each component are shown in Figure 5. Input to Schedule- 

*For example, in the equation A[X» J] * A[I t J- 1] ♦ A[J , I] the subscript* I and J are not in a consistent position. 
*We can delete these recursive edges and still generate a correct schedule because if the loop iterates from the low 
bound of the subrange to the high bound, a reference to (for example) A[I-3] will refer to an element of A which was 
produced two iterations back. 
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Component 

Node(s) 

Flowchart 

1 

InitialA 

null 

2 

m 

null 

3 

maxK 

null 

4 

eq.l 

DOALL I (DOALL J (eq.l )) 

5 

A, eq.3 

DO K (DOALL I (DOALL J (eq.3))) 

6 

eq.2 

DOALL I (DOALL J (eq.2)) 

7 

newA 

null 


Figure 5: Component Graph and Corresponding Flowchart 

Graph is the entire graph. The MSCC’s of the graph are shown in Figure 5. Schedule-Graph 
calls Schedule-Component successively with component. The third column of Figure 5 shows the 
flowchart returned by Schedule-Component for each of the components. Components 1, 2, 3, and 
7 result in a null flowchart being returned by Schedule-Component at step 1 of the algorithm. 

For Component 4, Schedule-Component chooses the first dimension to schedule (I) and recur- 
sively calls Schedule-Graph with that component as input. Schedule-Graph in turn recursively 
calls Schedule-Component with the same component, the first dimension of which is marked sched- 
uled. Schedule-Component now marks the second dimension (J) scheduled, and calls Schedule- 
Graph with the same component as input. Schedule-Graph calls Schedule-Component. Now, by 
step 2b of Schedule-Component, eq.l is returned as result to Schedule-Graph, which returns it to 
Schedule-Component, which concatenates it onto a “DOALL J” and returns “DOALL J eq.l” to 
Schedule-Graph, which returns it to Schedule-Component, which concatenates it onto “DOALL F 
and returns “DOALL I DOALL J eq.2” to the original call of Schedule-Graph. 

Component 6 is processed in exactly the same way as Component 4. Component 7, however, 
is a multi-node MSCC. Schedule-Component picks the first dimension (K) to schedule first. The 
other two cannot be chosen because of subscript expressions “J + 1” and “I + 1” (see Schedule- 
Component step 3). All edges having subscript expression “K - 1” are deleted by step 4, and 
Schedule-Graph is called recursively. The subgraph now has two components, eq.3 and A. Eq.3 
can be scheduled in the I and J dimensions as outlined above for eq.l. The flowchart for A is null. 
Thus the flowchart for Component 5 consists of an inner two level DOALL and an outer DO. The 
final schedule returned by the outermost call to Schedule-Graph is shown in Figure 6. 

3.4 Virtual Dimension 

The code generation phase generates C declarations and assignment statements. For each variable, 
either input parameter, output parameter, or local variable, an equivalent C declaration is gener- 
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DO ALL Z ( 

DOALL J ( 
•q.l 

) 

) 

DO K ( 

DOALL I ( 
DOALL J ( 
•q.3 

) 

) 

) 

DOALL I ( 

DOALL J ( 
•q.2 

) 

) 


Figure 6: Flowchart for the Relaxation Module 




11 


ated. Then, using the flowchart, the code generator emits for loops and assignment statements. 

Thus array declarations are generated for each of the arrays InitialA, NewA, and A. Now it is 
obvious that allocating a three dimensional array for A is unnecessary, 4 since in C and other imper- 
ative languages the same function can be performed by a two dimensional array with reassignment. 
The k’th dimension of A can be thought a “virtual” dimension rather than one physically allocated 
in its entirety. 

In general, a data node dimension is defined to be physical if the number of elements allocated at 
that dimension of the generated variable is the same as the number declared in the PS declaration. 
A data node dimension is virtual if the dimension is mapped to a “window” of elements, and the 
width of the window is smaller that the PS declared size. For the array A, a window of two elements 
is needed, the current one K, and the previous, K-l. 

The scheduler recognizes virtual data node dimensions during the Schedule-Component phase. 
For each data node which is a local variable N r in the component Afj, the node dimension being 
scheduled is marked virtual if each edge from N r to a node of type equation is in one or both of 
the following forms: 

1. The edge has subscript expression “F or “I - constant” in the dimension being scheduled, 
and the target is in 

2. The edge goes to a node outside the component, and the edge has a subscript expression of 
the form “N”, where “N” has been used as the upper bound of the sub rein ge defining that 
dimension. This type of edge indicates that only the last element at that dimension is used 
outside the loop. 

In the case of our example, local variable A is virtual in dimension 1. The other two dimensions 
are not virtual for two reasons: first, they have edges with subscript expression “I + constant”, 
and second, there are edges going out out of the component which don’t have the second form 
of subscript expression in those dimensions. Therefore the scheduler marks the first dimension of 
data node A virtual with window two, thereby directing the code generator to allocate only two 
instances rather than maxK instances. 

4 A Restructuring Transformation 

We now look at a slightly modified version of Equation 1. We will show that what appears to 
be a strictly iterative formulation can, with a shift of coordinate system, still result in a parallel 
loop, and that the “iteration” superscript need not really be an iteration subscript in the generated 
program. 

4 It is also obvious that generating a declaration for NewA is unnecessary. Our solution to this problem is beyond 
the scope of this paper. 
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Let us now solve the more standard (but still simplified) relaxation, for Jfe > 1, 



This results in equation 3 of the module becoming: 


A[K,I,J] * if (I *0) (* carry over boundary points *) 

or (J * 0) 
or (I = M+l) 
or (J * M+l) 
then AtK-l.I.J] 
else ( ACK.I.J-1] 

+AEK.I-1.J) 

+ACK-1.I.J+1] 

+ACK-1.I+1.J] ) / 4; 


( 2 ) 


Now when we apply the scheduling algorithm, we find that deleting the K-l edges leaves two 
recursive edges, so that both the I and the J loop must be iterative. 

The resulting schedule is shown in Figure 7. 

The virtual dimension analysis gives the same result as in the previous version: the first di- 
mension of A is virtual with window of two elements. Note that each dimension is scheduled 
independently, so that we do not detect the fact that only one array is needed. 

At this point, we take a closer look at the data dependencies of A. The dataflow graph for A 
in which each array element is a node (rather than the form used above in which there is a single 
node for the entire array) shows that all elements whose indices satisfy the equation 


2 K + I + J = t, t = 1 . . . 2 x maxK + 2xJlf 
can be computed at one time. 

In this section we demonstrate a technique by which such parallelism can be detected. In 
particular, we will show how to find linear solutions to a set of inequatities representing the recursive 
array dependencies. See [10] for a more complete treatment of the subject for constant offset 
recursive equations, and [14] for an extension to the method which handles certain forms of symbolic 
offsets in recursive equations. 

Our fundamental constraint is that data must be produced before it can be used. Thus A[K, J, J] 
cannot be created until after A[K - 1,I,J\,A[K,I,J - 1], A[K, I - 1, J], A[K - 1 ,/, J + 1], and 
A[K, I + 1, J] are available. 

We define the time of creation for each array element as a linear combination of the indices. 
For the recursively defined array A, this gives us the time equation 

t{A[K , /, J\) =aK + bI + eJ. 



13 


Initi&lA 
DOALL I ( 
DOALL J ( 
•q.l 

) 

) 

DO K ( 

DO I ( 
DO J ( 
eq.3 

) 

) 

) 

DOALL I ( 
DOALL J ( 
eq.2 

) 

) 


Figure 7: Flowchart with Revised Eq.3 
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Our first problem is to solve for the coefficients o, b, and c. 

We now represent the problem’s dependence ordering with (strict) inequalities involving time. 
In this case, the time for A[K,I,J\ must come after the time for A[K - 1,1, J], etc. which gives 
us five dependence inequalities : 

aK + bl + cJ > a(K — 1) t- bl + cJ => a > 0 
aK + 1/ + cJ > aK + W + c(«7 — 1) =>■ c > 0 
aK + bl + cJ > aK + 6(1 — 1) + cJ => b > 0 
aK + bl + cj > a(K - 1) + bl + c( J + 1) => a> c 
aK + bl + cJ > a(K - 1) + b(I -f 1) + cJ => a > b 

Now we can find the least integers a b , and c for which these dependence inequalities will hold. 
In this case, we get a = 2 and b = c = 1, and arrive at the time equation 2K + 1+ J cited above. 

All array elements A [if, /, J] such that 2 K + / + J = t will be defined at time t. For given t, 
these entries comprise a “hyperplane”. As t is increased from 0 to t\f ax = K\f ax + ijvfo* + J\faz, 
we find a sequence of such hyperplanes which cover every point in the array. 

We now define a new array A 1 related to A so that A'[K' t P t J'] = A[K,I,J] and have 
A'[K',r,J'\ be constructed at time K'. Thus, we transform the coordinates AT,/, J for the ar- 
ray A into coordinates K\ P, J' such that K' = t = 2K + I+J. A method for obtaining the P and 
J' dimensions after K’ has been determined is given in [10]. In this example, we find that P = K 
and J' = I. Specifically, 

K' = 2K + I + J P — K J' = I 

K = P I = J' J = K' - 2P - J’ 

A[K, I, J] = A'[if = A'[2K + I + J,K,I\, 

A![K\ P, J 1 } = A[K , I, J] = A[P, J', K'-2P- J’\ 

Using these equalities, we derive the following recursive equation using A'. 

A'[AT',/',7'] = A[P,J',K' - 2P - J'\ 

= A[P — 1, J', K 1 -2,1 — J'] 

ii J' = 0V K' -2P - J 1 = 0V J' = M+\V K' -2P - J' = M+l 

= A'[Jf' -2 t P — 1, J'] 

'll J' = 0v K' - 2p - J' = QV J' = M + IV K' - 2p - J' = M + 1 
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= (A[r, j\ k' - 2 r - j 1 - 1 ] + A[r, j 1 - 1 , k* - 21 ' - j'] 

+A[r - 1, J\K‘ - ir - J' + 1] + A[I' - 1 , J' + l,K'-2f - J '])/4 

otherwise 

= A'[K' - 1, 1 ', J'] + A![K ' - 1, r, J' - 1] 

+A’[i C - 1 - 1,/'] + A'[K‘ - 1,/ - 1, J 1 + 1] 
otherwise by simplification 

Applying the scheduling algorithm to the subgraph of this recursive equation gives us an outer 
iteration, as before. However, once the a K' • constant” edges have been deleted, the I and J 
dimension can be scheduled as parallel loops (as in the example based on Equation 1) rather than 
iterative. In fact, the schedule is identical to that of Figure 6. 

In addition, by using the transformed array A ' instead of the original array A in the scheduling 
algorithm, we now find that the first dimension of A' is virtual, since the only references are to A'— 1 
or K' — 2. The window size is three, so that we can allocate an array 3 x P Max x J' Max = 3 x maxK x M 
rather than 2 x M x M, the space allocation of the purely iterative version. 

In the final code which is generated, there are several alternatives in how the transformed array 
is treated. We can flag arrays which have undergone this transformation, and replace each reference 
to J # j by A[r,J',K' — 2i f — J']. Alternatively, we could redefined A as a function which 

retrieves the proper entry from A'. With a little more intelligence, we could rotate the input array 
into A'[l], work entirely with the transformed array A' in the recurrence, and unrotate back into 
the return parameter. The latter approach is preferable because a regular pattern of array reference 
is established for the iteration which can be optimized (with respect to stride length) in procedural 
multiprocessor compilers. 

5 Conclusion 

We have presented a nonprocedural dataflow language and shown how the compiler for the language 
creates the flowchart of a procedural loop program with iterative and forall loops. We have shown 
how opportunities for storage reuse are detected by the scheduler, and that subscript transformation 
may be performed so that 1) an apparently iterative formulation can be transformed into a parallel 
one from which a parallel loop can be generated, and 2) storage reuse can be applied to the 
transformed array. 

Implementation effort is focussed on the following topics: 

• Integration of the array subscript restructuring algorithm into the compiler. 

• a graphical front end, which can translate Equation 1 or Equation 2 into PS. 

• Improvement of the scheduler to better merge iterative loops. 
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